--- title: "Airbnb Copenhagen" author: "Studygroup 8" date: "2020-09-19" output: html_document: theme: flatly highlight: zenburn number_sections: yes toc: yes toc_float: yes code_folding: show ---
Source: www.ontheluce.com
Copenhagen, the happiest city in the world, Who doesn’t want to go there! Happiness comes at a price though, you’ll need to find a place to stay. What better way to look for a place to stay than on AirBnB and us being students, would like to know what this will cost us. This is where the MAM program from LBS is already paying itself off. To find out what variables (e.g. room type, property type) influence the price most so we can predict the cost with a regression analysis. So our dream holiday can become reality.
The data we will be using for this comes directly from AirBnB. In this analysis we will only be using data on the city of Copenhagen. the data shows the listings as per June 26th 2020.
This analysis is divided into 4 parts:
Using AirBnB data for listings at the time of June 26th 2020, we predicted the costs of a stay in Copenhagen. The requirements were 4 nights with 2 persons, a private room and a minimum review score of 90 with at least 10 reviews. Our method was to run and exploratory data analysis to set the scene for the data that we would use to make our prediction. Once we had confirmed the important values that could have a significant influence on the price. we identified the following variables:
After running regression models with these variables we came to a model that had the most prediction power, model 7. With the specified criteria we made a prediction with 95% confidence that the cost for a median listing on Airbnb in Copenhagen is between 1,860.18 DKK and 1,938.58 DKK.
To get an idea of the data structure, an exploratory data analysis is conducted. The first steps are to look at the raw values and to clean the data afterwards. This gives a foundation to use in summary statistics. Finally, visualisations will give an impression of any correlations in variables and will give an outlook on variables that stand out.
First of all, the data needs to be loaded. The clean_names() function is used to make the values consistent and neat, to help with the data wrangling.
## Rows: 28,523
## Columns: 106
## Delimiter: ","
## chr [45]: listing_url, name, summary, space, description, experiences_offered, neighborho...
## dbl [40]: id, scrape_id, host_id, host_listings_count, host_total_listings_count, zipcode...
## lgl [16]: thumbnail_url, medium_url, xl_picture_url, host_is_superhost, host_has_profile_...
## date [ 5]: last_scraped, host_since, calendar_last_scraped, first_review, last_review
##
## Use `spec()` to retrieve the guessed column specification
## Pass a specification to the `col_types` argument to quiet this message
Having looked at the data, the following variables will be of importance throughout this analysis:
price : cost per night
cleaning_fee: cleaning fee
extra_people: charge for having more than 1 person
property_type: type of accommodation (House, Apartment, etc.)
room_type: Entire home/apt (guests have entire place to themselves) Private room (Guests have private room to sleep, all other rooms shared) Shared room (Guests sleep in room shared with others) number_of_reviews: Total number of reviews for the listing
review_scores_rating: Average review score (0 - 100)
longitude , latitude: geographical coordinates to help us locate the listing
neighborhood*: three variables on a few major neighborhoods in each city
The next step is to check the data for the properties of the variables and if there are any missing or NA values that need to be addressed.
## Rows: 28,523
## Columns: 106
## $ id <dbl> 6983, 26057, 26473, 29...
## $ listing_url <chr> "https://www.airbnb.co...
## $ scrape_id <dbl> 2.020063e+13, 2.020063...
## $ last_scraped <date> 2020-06-28, 2020-06-2...
## $ name <chr> "Copenhagen 'N Livin'"...
## $ summary <chr> "Lovely apartment loca...
## $ space <chr> "Beautiful and cosy ap...
## $ description <chr> "Lovely apartment loca...
## $ experiences_offered <chr> "none", "none", "none"...
## $ neighborhood_overview <chr> "Nice bars and cozy ca...
## $ notes <chr> NA, NA, NA, NA, "Pleas...
## $ transit <chr> "Bus 66 runs to the ce...
## $ access <chr> "Bedroom, living room,...
## $ interaction <chr> "We are usually at wor...
## $ house_rules <chr> "No smoking allowed! N...
## $ thumbnail_url <lgl> NA, NA, NA, NA, NA, NA...
## $ medium_url <lgl> NA, NA, NA, NA, NA, NA...
## $ picture_url <chr> "https://a0.muscache.c...
## $ xl_picture_url <lgl> NA, NA, NA, NA, NA, NA...
## $ host_id <dbl> 16774, 109777, 112210,...
## $ host_url <chr> "https://www.airbnb.co...
## $ host_name <chr> "Simon", "Kari", "Oliv...
## $ host_since <date> 2009-05-12, 2010-04-1...
## $ host_location <chr> "Copenhagen, Capital R...
## $ host_about <chr> "I'm currently working...
## $ host_response_time <chr> "N/A", "N/A", "within ...
## $ host_response_rate <chr> "N/A", "N/A", "100%", ...
## $ host_acceptance_rate <chr> "33%", "19%", "100%", ...
## $ host_is_superhost <lgl> FALSE, FALSE, FALSE, F...
## $ host_thumbnail_url <chr> "https://a0.muscache.c...
## $ host_picture_url <chr> "https://a0.muscache.c...
## $ host_neighbourhood <chr> "Nørrebro", "Indre By"...
## $ host_listings_count <dbl> 1, 1, 4, 1, 1, 1, 3, 1...
## $ host_total_listings_count <dbl> 1, 1, 4, 1, 1, 1, 3, 1...
## $ host_verifications <chr> "['email', 'phone', 'r...
## $ host_has_profile_pic <lgl> TRUE, TRUE, TRUE, TRUE...
## $ host_identity_verified <lgl> FALSE, FALSE, TRUE, FA...
## $ street <chr> "Copenhagen, Hovedstad...
## $ neighbourhood <chr> "Nørrebro", "Indre By"...
## $ neighbourhood_cleansed <chr> "Nrrebro", "Indre By",...
## $ neighbourhood_group_cleansed <lgl> NA, NA, NA, NA, NA, NA...
## $ city <chr> "Copenhagen", "Copenha...
## $ state <chr> "Hovedstaden", "Hoveds...
## $ zipcode <dbl> 2200, 2100, 1210, 1650...
## $ market <chr> "Copenhagen", "Copenha...
## $ smart_location <chr> "Copenhagen, Denmark",...
## $ country_code <chr> "DK", "DK", "DK", "DK"...
## $ country <chr> "Denmark", "Denmark", ...
## $ latitude <dbl> 55.68798, 55.69163, 55...
## $ longitude <dbl> 12.54571, 12.57459, 12...
## $ is_location_exact <lgl> TRUE, TRUE, TRUE, TRUE...
## $ property_type <chr> "Apartment", "House", ...
## $ room_type <chr> "Private room", "Entir...
## $ accommodates <dbl> 2, 6, 12, 2, 4, 3, 3, ...
## $ bathrooms <dbl> 1.0, 1.5, 2.5, 1.0, 1....
## $ bedrooms <dbl> 1, 4, 6, 1, 3, 1, 1, 2...
## $ beds <dbl> 1, 4, 7, 1, 3, 3, 2, 2...
## $ bed_type <chr> "Real Bed", "Real Bed"...
## $ amenities <chr> "{TV,\"Cable TV\",Wifi...
## $ square_feet <dbl> 97, NA, NA, NA, NA, 68...
## $ price <chr> "$365.00", "$2,398.00"...
## $ weekly_price <chr> NA, NA, "$17,513.00", ...
## $ monthly_price <chr> NA, NA, "$67,073.00", ...
## $ security_deposit <chr> "$0.00", "$5,000.00", ...
## $ cleaning_fee <chr> "$33.00", "$1,100.00",...
## $ guests_included <dbl> 1, 3, 1, 1, 1, 2, 2, 2...
## $ extra_people <chr> "$66.00", "$350.00", "...
## $ minimum_nights <dbl> 2, 3, 3, 7, 7, 2, 3, 6...
## $ maximum_nights <dbl> 15, 30, 31, 14, 31, 10...
## $ minimum_minimum_nights <dbl> 2, 3, 3, 3, 7, 2, 3, 6...
## $ maximum_minimum_nights <dbl> 2, 3, 3, 5, 7, 2, 3, 6...
## $ minimum_maximum_nights <dbl> 15, 30, 1125, 14, 1125...
## $ maximum_maximum_nights <dbl> 15, 30, 1125, 14, 1125...
## $ minimum_nights_avg_ntm <dbl> 2.0, 3.0, 3.0, 4.1, 7....
## $ maximum_nights_avg_ntm <dbl> 15, 30, 1125, 14, 1125...
## $ calendar_updated <chr> "5 months ago", "4 mon...
## $ has_availability <lgl> TRUE, TRUE, TRUE, TRUE...
## $ availability_30 <dbl> 29, 28, 29, 21, 0, 0, ...
## $ availability_60 <dbl> 59, 58, 59, 21, 0, 0, ...
## $ availability_90 <dbl> 89, 88, 89, 21, 0, 0, ...
## $ availability_365 <dbl> 89, 363, 172, 21, 0, 5...
## $ calendar_last_scraped <date> 2020-06-28, 2020-06-2...
## $ number_of_reviews <dbl> 168, 50, 293, 22, 90, ...
## $ number_of_reviews_ltm <dbl> 1, 4, 31, 2, 0, 0, 1, ...
## $ first_review <date> 2009-09-04, 2013-12-0...
## $ last_review <date> 2019-07-19, 2019-12-1...
## $ review_scores_rating <dbl> 96, 98, 91, 98, 94, 97...
## $ review_scores_accuracy <dbl> 10, 10, 10, 10, 10, 10...
## $ review_scores_cleanliness <dbl> 9, 10, 9, 10, 9, 10, 1...
## $ review_scores_checkin <dbl> 10, 10, 10, 10, 10, 10...
## $ review_scores_communication <dbl> 10, 10, 10, 10, 9, 10,...
## $ review_scores_location <dbl> 9, 10, 10, 10, 10, 10,...
## $ review_scores_value <dbl> 9, 10, 9, 10, 9, 9, 9,...
## $ requires_license <lgl> FALSE, FALSE, FALSE, F...
## $ license <lgl> NA, NA, NA, NA, NA, NA...
## $ jurisdiction_names <lgl> NA, NA, NA, NA, NA, NA...
## $ instant_bookable <lgl> FALSE, FALSE, FALSE, F...
## $ is_business_travel_ready <lgl> FALSE, FALSE, FALSE, F...
## $ cancellation_policy <chr> "moderate", "moderate"...
## $ require_guest_profile_picture <lgl> FALSE, FALSE, FALSE, F...
## $ require_guest_phone_verification <lgl> FALSE, FALSE, FALSE, F...
## $ calculated_host_listings_count <dbl> 1, 1, 1, 1, 1, 1, 1, 1...
## $ calculated_host_listings_count_entire_homes <dbl> 0, 1, 1, 1, 1, 1, 1, 1...
## $ calculated_host_listings_count_private_rooms <dbl> 1, 0, 0, 0, 0, 0, 0, 0...
## $ calculated_host_listings_count_shared_rooms <dbl> 0, 0, 0, 0, 0, 0, 0, 0...
## $ reviews_per_month <dbl> 1.28, 0.62, 2.48, 0.18...
| Name | listings |
| Number of rows | 28523 |
| Number of columns | 106 |
| _______________________ | |
| Column type frequency: | |
| character | 45 |
| Date | 5 |
| logical | 16 |
| numeric | 40 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| listing_url | 0 | 1.00 | 33 | 37 | 0 | 28523 | 0 |
| name | 57 | 1.00 | 1 | 211 | 0 | 26907 | 0 |
| summary | 1096 | 0.96 | 1 | 1000 | 0 | 26986 | 0 |
| space | 11390 | 0.60 | 1 | 1000 | 0 | 16806 | 0 |
| description | 515 | 0.98 | 1 | 1000 | 0 | 27748 | 0 |
| experiences_offered | 0 | 1.00 | 4 | 4 | 0 | 1 | 0 |
| neighborhood_overview | 12407 | 0.57 | 1 | 1000 | 0 | 15520 | 0 |
| notes | 20832 | 0.27 | 1 | 1000 | 0 | 7366 | 0 |
| transit | 11316 | 0.60 | 1 | 1000 | 0 | 16714 | 0 |
| access | 15751 | 0.45 | 1 | 1000 | 0 | 11251 | 0 |
| interaction | 14111 | 0.51 | 1 | 1000 | 0 | 13728 | 0 |
| house_rules | 13332 | 0.53 | 1 | 1000 | 0 | 13703 | 0 |
| picture_url | 0 | 1.00 | 81 | 146 | 0 | 28282 | 0 |
| host_url | 0 | 1.00 | 37 | 43 | 0 | 25745 | 0 |
| host_name | 12 | 1.00 | 1 | 34 | 0 | 6415 | 0 |
| host_location | 93 | 1.00 | 2 | 152 | 0 | 863 | 0 |
| host_about | 14028 | 0.51 | 1 | 3550 | 0 | 12478 | 34 |
| host_response_time | 11 | 1.00 | 3 | 18 | 0 | 5 | 0 |
| host_response_rate | 11 | 1.00 | 2 | 4 | 0 | 44 | 0 |
| host_acceptance_rate | 11 | 1.00 | 2 | 4 | 0 | 100 | 0 |
| host_thumbnail_url | 11 | 1.00 | 55 | 106 | 0 | 25669 | 0 |
| host_picture_url | 11 | 1.00 | 57 | 109 | 0 | 25669 | 0 |
| host_neighbourhood | 8007 | 0.72 | 1 | 21 | 0 | 56 | 0 |
| host_verifications | 0 | 1.00 | 2 | 156 | 0 | 316 | 0 |
| street | 0 | 1.00 | 10 | 61 | 0 | 528 | 0 |
| neighbourhood | 0 | 1.00 | 5 | 14 | 0 | 21 | 0 |
| neighbourhood_cleansed | 0 | 1.00 | 5 | 25 | 0 | 11 | 0 |
| city | 13 | 1.00 | 1 | 26 | 0 | 136 | 0 |
| state | 24042 | 0.16 | 1 | 25 | 0 | 170 | 0 |
| market | 864 | 0.97 | 6 | 21 | 0 | 10 | 0 |
| smart_location | 0 | 1.00 | 10 | 35 | 0 | 158 | 0 |
| country_code | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
| country | 0 | 1.00 | 7 | 7 | 0 | 1 | 0 |
| property_type | 0 | 1.00 | 3 | 22 | 0 | 29 | 0 |
| room_type | 0 | 1.00 | 10 | 15 | 0 | 4 | 0 |
| bed_type | 0 | 1.00 | 5 | 13 | 0 | 5 | 0 |
| amenities | 0 | 1.00 | 2 | 1179 | 0 | 26634 | 0 |
| price | 0 | 1.00 | 5 | 10 | 0 | 611 | 0 |
| weekly_price | 25008 | 0.12 | 7 | 11 | 0 | 718 | 0 |
| monthly_price | 26971 | 0.05 | 7 | 11 | 0 | 476 | 0 |
| security_deposit | 13845 | 0.51 | 5 | 10 | 0 | 386 | 0 |
| cleaning_fee | 8968 | 0.69 | 5 | 9 | 0 | 430 | 0 |
| extra_people | 0 | 1.00 | 5 | 9 | 0 | 265 | 0 |
| calendar_updated | 0 | 1.00 | 5 | 13 | 0 | 82 | 0 |
| cancellation_policy | 0 | 1.00 | 8 | 27 | 0 | 4 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| last_scraped | 0 | 1.00 | 2020-06-26 | 2020-06-30 | 2020-06-27 | 5 |
| host_since | 11 | 1.00 | 2008-06-27 | 2020-06-26 | 2015-06-22 | 3379 |
| calendar_last_scraped | 0 | 1.00 | 2020-06-26 | 2020-06-30 | 2020-06-27 | 5 |
| first_review | 4968 | 0.83 | 2009-09-04 | 2020-06-28 | 2017-08-07 | 2582 |
| last_review | 4968 | 0.83 | 2011-08-10 | 2020-06-28 | 2019-07-29 | 1846 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| thumbnail_url | 28523 | 0 | NaN | : |
| medium_url | 28523 | 0 | NaN | : |
| xl_picture_url | 28523 | 0 | NaN | : |
| host_is_superhost | 11 | 1 | 0.10 | FAL: 25610, TRU: 2902 |
| host_has_profile_pic | 11 | 1 | 1.00 | TRU: 28445, FAL: 67 |
| host_identity_verified | 11 | 1 | 0.38 | FAL: 17790, TRU: 10722 |
| neighbourhood_group_cleansed | 28523 | 0 | NaN | : |
| is_location_exact | 0 | 1 | 0.79 | TRU: 22507, FAL: 6016 |
| has_availability | 0 | 1 | 1.00 | TRU: 28523 |
| requires_license | 0 | 1 | 0.00 | FAL: 28523 |
| license | 28523 | 0 | NaN | : |
| jurisdiction_names | 28523 | 0 | NaN | : |
| instant_bookable | 0 | 1 | 0.29 | FAL: 20360, TRU: 8163 |
| is_business_travel_ready | 0 | 1 | 0.00 | FAL: 28523 |
| require_guest_profile_picture | 0 | 1 | 0.01 | FAL: 28379, TRU: 144 |
| require_guest_phone_verification | 0 | 1 | 0.01 | FAL: 28327, TRU: 196 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1.00 | 2.044063e+07 | 12261143.29 | 6.983000e+03 | 1.008379e+07 | 1.950297e+07 | 3.043587e+07 | 4.395635e+07 | <U+2587><U+2587><U+2587><U+2585><U+2586> |
| scrape_id | 0 | 1.00 | 2.020063e+13 | 0.00 | 2.020063e+13 | 2.020063e+13 | 2.020063e+13 | 2.020063e+13 | 2.020063e+13 | <U+2581><U+2581><U+2587><U+2581><U+2581> |
| host_id | 0 | 1.00 | 6.731400e+07 | 76172316.22 | 5.130000e+02 | 1.281774e+07 | 3.604205e+07 | 9.463162e+07 | 3.519509e+08 | <U+2587><U+2582><U+2581><U+2581><U+2581> |
| host_listings_count | 11 | 1.00 | 5.460000e+00 | 34.32 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 7.370000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| host_total_listings_count | 11 | 1.00 | 5.460000e+00 | 34.32 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 7.370000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| zipcode | 815 | 0.97 | 2.069760e+03 | 390.81 | 2.000000e+01 | 1.855750e+03 | 2.200000e+03 | 2.300000e+03 | 2.100000e+04 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| latitude | 0 | 1.00 | 5.568000e+01 | 0.02 | 5.562000e+01 | 5.567000e+01 | 5.568000e+01 | 5.570000e+01 | 5.573000e+01 | <U+2581><U+2583><U+2587><U+2587><U+2581> |
| longitude | 0 | 1.00 | 1.256000e+01 | 0.03 | 1.245000e+01 | 1.254000e+01 | 1.255000e+01 | 1.258000e+01 | 1.264000e+01 | <U+2581><U+2582><U+2587><U+2585><U+2582> |
| accommodates | 0 | 1.00 | 3.320000e+00 | 1.63 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 4.000000e+00 | 1.600000e+01 | <U+2587><U+2582><U+2581><U+2581><U+2581> |
| bathrooms | 12 | 1.00 | 1.080000e+00 | 0.28 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| bedrooms | 29 | 1.00 | 1.550000e+00 | 1.06 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 2.000000e+00 | 1.010000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| beds | 103 | 1.00 | 2.040000e+00 | 1.44 | 0.000000e+00 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 2.500000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| square_feet | 28130 | 0.01 | 7.218900e+02 | 576.10 | 0.000000e+00 | 1.200000e+02 | 7.640000e+02 | 1.076000e+03 | 2.799000e+03 | <U+2587><U+2587><U+2583><U+2581><U+2581> |
| guests_included | 0 | 1.00 | 1.520000e+00 | 1.06 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 2.000000e+00 | 1.600000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| minimum_nights | 0 | 1.00 | 3.850000e+00 | 18.07 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 4.000000e+00 | 1.100000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| maximum_nights | 0 | 1.00 | 6.206700e+02 | 553.19 | 1.000000e+00 | 1.500000e+01 | 1.125000e+03 | 1.125000e+03 | 9.999000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| minimum_minimum_nights | 0 | 1.00 | 3.840000e+00 | 18.08 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 4.000000e+00 | 1.100000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| maximum_minimum_nights | 0 | 1.00 | 4.120000e+00 | 19.29 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 4.000000e+00 | 1.100000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| minimum_maximum_nights | 0 | 1.00 | 6.669200e+02 | 548.00 | 1.000000e+00 | 2.000000e+01 | 1.125000e+03 | 1.125000e+03 | 9.999000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| maximum_maximum_nights | 0 | 1.00 | 6.703800e+02 | 547.39 | 1.000000e+00 | 2.000000e+01 | 1.125000e+03 | 1.125000e+03 | 9.999000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| minimum_nights_avg_ntm | 0 | 1.00 | 3.970000e+00 | 18.35 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 4.000000e+00 | 1.100000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| maximum_nights_avg_ntm | 0 | 1.00 | 6.684100e+02 | 547.20 | 1.000000e+00 | 2.000000e+01 | 1.125000e+03 | 1.125000e+03 | 9.999000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| availability_30 | 0 | 1.00 | 5.870000e+00 | 10.42 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 8.000000e+00 | 3.000000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2582> |
| availability_60 | 0 | 1.00 | 1.124000e+01 | 20.15 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.400000e+01 | 6.000000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2582> |
| availability_90 | 0 | 1.00 | 1.649000e+01 | 29.99 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.800000e+01 | 9.000000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| availability_365 | 0 | 1.00 | 4.947000e+01 | 99.23 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 3.500000e+01 | 3.650000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| number_of_reviews | 0 | 1.00 | 1.357000e+01 | 27.09 | 0.000000e+00 | 1.000000e+00 | 5.000000e+00 | 1.500000e+01 | 6.370000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| number_of_reviews_ltm | 0 | 1.00 | 2.740000e+00 | 6.64 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 3.000000e+00 | 3.760000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| review_scores_rating | 5447 | 0.81 | 9.516000e+01 | 6.83 | 2.000000e+01 | 9.300000e+01 | 9.700000e+01 | 1.000000e+02 | 1.000000e+02 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_accuracy | 5468 | 0.81 | 9.730000e+00 | 0.66 | 2.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_cleanliness | 5466 | 0.81 | 9.380000e+00 | 0.96 | 2.000000e+00 | 9.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_checkin | 5488 | 0.81 | 9.820000e+00 | 0.56 | 2.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_communication | 5470 | 0.81 | 9.860000e+00 | 0.53 | 2.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_location | 5491 | 0.81 | 9.600000e+00 | 0.68 | 2.000000e+00 | 9.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_value | 5495 | 0.81 | 9.450000e+00 | 0.76 | 2.000000e+00 | 9.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| calculated_host_listings_count | 0 | 1.00 | 4.450000e+00 | 28.06 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 2.810000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| calculated_host_listings_count_entire_homes | 0 | 1.00 | 4.170000e+00 | 28.08 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 2.810000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| calculated_host_listings_count_private_rooms | 0 | 1.00 | 2.600000e-01 | 0.73 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.200000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| calculated_host_listings_count_shared_rooms | 0 | 1.00 | 1.000000e-02 | 0.33 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.300000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| reviews_per_month | 4968 | 0.83 | 4.900000e-01 | 0.73 | 1.000000e-02 | 1.200000e-01 | 2.800000e-01 | 5.900000e-01 | 3.065000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
From this list can be concluded that there are 28,523 values for 106 variables. there are a couple of values that contain NA, 1 of them, cleaning_fee, stands out and is one of the variables that was flagged as important in this analysis. This will be addressed in the next section. Other variables that stand out are price, cleaning_fee and extra_people. these variables are stored as character values and need to have the $-sign removed to be stored as a numeric value. Since these variables will be used further in the analyis, this also will be resolved in the next section where we clean the data and run summary statistics.
In this section we will clean the data and run summary statistics that will be visualised in the next section. First the values stored as characters are converted to numeric values and the $-sign removed.
Following this, the NA’s in the cleaning_fee values will be converted to zero. Assuming the NA’s are from hosts not providing an input for cleaning fees, the NA’s should equal zero.
The property type categorical variables has too many categories to be summarised. To reduce this, the 4 most substantial categories have been specified, the remainder will be classed as other.
## `summarise()` ungrouping output (override with `.groups` argument)
| property_type | count |
|---|---|
| Apartment | 23942 |
| Condominium | 1673 |
| House | 1365 |
| Townhouse | 554 |
| Serviced apartment | 331 |
| Loft | 231 |
| Villa | 190 |
| Hostel | 28 |
| Guesthouse | 23 |
| Bungalow | 22 |
| Guest suite | 22 |
| Bed and breakfast | 21 |
| Houseboat | 21 |
| Boat | 20 |
| Hotel | 20 |
| Other | 14 |
| Tiny house | 14 |
| Cabin | 10 |
| Boutique hotel | 5 |
| Cottage | 4 |
| Hut | 3 |
| Aparthotel | 2 |
| Barn | 2 |
| Camper/RV | 1 |
| Casa particular (Cuba) | 1 |
| Earth house | 1 |
| Island | 1 |
| Lighthouse | 1 |
| Tent | 1 |
| Name | listings |
| Number of rows | 28523 |
| Number of columns | 107 |
| _______________________ | |
| Column type frequency: | |
| character | 43 |
| Date | 5 |
| logical | 16 |
| numeric | 43 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| listing_url | 0 | 1.00 | 33 | 37 | 0 | 28523 | 0 |
| name | 57 | 1.00 | 1 | 211 | 0 | 26907 | 0 |
| summary | 1096 | 0.96 | 1 | 1000 | 0 | 26986 | 0 |
| space | 11390 | 0.60 | 1 | 1000 | 0 | 16806 | 0 |
| description | 515 | 0.98 | 1 | 1000 | 0 | 27748 | 0 |
| experiences_offered | 0 | 1.00 | 4 | 4 | 0 | 1 | 0 |
| neighborhood_overview | 12407 | 0.57 | 1 | 1000 | 0 | 15520 | 0 |
| notes | 20832 | 0.27 | 1 | 1000 | 0 | 7366 | 0 |
| transit | 11316 | 0.60 | 1 | 1000 | 0 | 16714 | 0 |
| access | 15751 | 0.45 | 1 | 1000 | 0 | 11251 | 0 |
| interaction | 14111 | 0.51 | 1 | 1000 | 0 | 13728 | 0 |
| house_rules | 13332 | 0.53 | 1 | 1000 | 0 | 13703 | 0 |
| picture_url | 0 | 1.00 | 81 | 146 | 0 | 28282 | 0 |
| host_url | 0 | 1.00 | 37 | 43 | 0 | 25745 | 0 |
| host_name | 12 | 1.00 | 1 | 34 | 0 | 6415 | 0 |
| host_location | 93 | 1.00 | 2 | 152 | 0 | 863 | 0 |
| host_about | 14028 | 0.51 | 1 | 3550 | 0 | 12478 | 34 |
| host_response_time | 11 | 1.00 | 3 | 18 | 0 | 5 | 0 |
| host_response_rate | 11 | 1.00 | 2 | 4 | 0 | 44 | 0 |
| host_acceptance_rate | 11 | 1.00 | 2 | 4 | 0 | 100 | 0 |
| host_thumbnail_url | 11 | 1.00 | 55 | 106 | 0 | 25669 | 0 |
| host_picture_url | 11 | 1.00 | 57 | 109 | 0 | 25669 | 0 |
| host_neighbourhood | 8007 | 0.72 | 1 | 21 | 0 | 56 | 0 |
| host_verifications | 0 | 1.00 | 2 | 156 | 0 | 316 | 0 |
| street | 0 | 1.00 | 10 | 61 | 0 | 528 | 0 |
| neighbourhood | 0 | 1.00 | 5 | 14 | 0 | 21 | 0 |
| neighbourhood_cleansed | 0 | 1.00 | 5 | 25 | 0 | 11 | 0 |
| city | 13 | 1.00 | 1 | 26 | 0 | 136 | 0 |
| state | 24042 | 0.16 | 1 | 25 | 0 | 170 | 0 |
| market | 864 | 0.97 | 6 | 21 | 0 | 10 | 0 |
| smart_location | 0 | 1.00 | 10 | 35 | 0 | 158 | 0 |
| country_code | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
| country | 0 | 1.00 | 7 | 7 | 0 | 1 | 0 |
| property_type | 0 | 1.00 | 3 | 22 | 0 | 29 | 0 |
| room_type | 0 | 1.00 | 10 | 15 | 0 | 4 | 0 |
| bed_type | 0 | 1.00 | 5 | 13 | 0 | 5 | 0 |
| amenities | 0 | 1.00 | 2 | 1179 | 0 | 26634 | 0 |
| weekly_price | 25008 | 0.12 | 7 | 11 | 0 | 718 | 0 |
| monthly_price | 26971 | 0.05 | 7 | 11 | 0 | 476 | 0 |
| security_deposit | 13845 | 0.51 | 5 | 10 | 0 | 386 | 0 |
| calendar_updated | 0 | 1.00 | 5 | 13 | 0 | 82 | 0 |
| cancellation_policy | 0 | 1.00 | 8 | 27 | 0 | 4 | 0 |
| prop_type_simplified | 0 | 1.00 | 5 | 11 | 0 | 5 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| last_scraped | 0 | 1.00 | 2020-06-26 | 2020-06-30 | 2020-06-27 | 5 |
| host_since | 11 | 1.00 | 2008-06-27 | 2020-06-26 | 2015-06-22 | 3379 |
| calendar_last_scraped | 0 | 1.00 | 2020-06-26 | 2020-06-30 | 2020-06-27 | 5 |
| first_review | 4968 | 0.83 | 2009-09-04 | 2020-06-28 | 2017-08-07 | 2582 |
| last_review | 4968 | 0.83 | 2011-08-10 | 2020-06-28 | 2019-07-29 | 1846 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| thumbnail_url | 28523 | 0 | NaN | : |
| medium_url | 28523 | 0 | NaN | : |
| xl_picture_url | 28523 | 0 | NaN | : |
| host_is_superhost | 11 | 1 | 0.10 | FAL: 25610, TRU: 2902 |
| host_has_profile_pic | 11 | 1 | 1.00 | TRU: 28445, FAL: 67 |
| host_identity_verified | 11 | 1 | 0.38 | FAL: 17790, TRU: 10722 |
| neighbourhood_group_cleansed | 28523 | 0 | NaN | : |
| is_location_exact | 0 | 1 | 0.79 | TRU: 22507, FAL: 6016 |
| has_availability | 0 | 1 | 1.00 | TRU: 28523 |
| requires_license | 0 | 1 | 0.00 | FAL: 28523 |
| license | 28523 | 0 | NaN | : |
| jurisdiction_names | 28523 | 0 | NaN | : |
| instant_bookable | 0 | 1 | 0.29 | FAL: 20360, TRU: 8163 |
| is_business_travel_ready | 0 | 1 | 0.00 | FAL: 28523 |
| require_guest_profile_picture | 0 | 1 | 0.01 | FAL: 28379, TRU: 144 |
| require_guest_phone_verification | 0 | 1 | 0.01 | FAL: 28327, TRU: 196 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1.00 | 2.044063e+07 | 12261143.29 | 6.983000e+03 | 1.008379e+07 | 1.950297e+07 | 3.043587e+07 | 4.395635e+07 | <U+2587><U+2587><U+2587><U+2585><U+2586> |
| scrape_id | 0 | 1.00 | 2.020063e+13 | 0.00 | 2.020063e+13 | 2.020063e+13 | 2.020063e+13 | 2.020063e+13 | 2.020063e+13 | <U+2581><U+2581><U+2587><U+2581><U+2581> |
| host_id | 0 | 1.00 | 6.731400e+07 | 76172316.22 | 5.130000e+02 | 1.281774e+07 | 3.604205e+07 | 9.463162e+07 | 3.519509e+08 | <U+2587><U+2582><U+2581><U+2581><U+2581> |
| host_listings_count | 11 | 1.00 | 5.460000e+00 | 34.32 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 7.370000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| host_total_listings_count | 11 | 1.00 | 5.460000e+00 | 34.32 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 7.370000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| zipcode | 815 | 0.97 | 2.069760e+03 | 390.81 | 2.000000e+01 | 1.855750e+03 | 2.200000e+03 | 2.300000e+03 | 2.100000e+04 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| latitude | 0 | 1.00 | 5.568000e+01 | 0.02 | 5.562000e+01 | 5.567000e+01 | 5.568000e+01 | 5.570000e+01 | 5.573000e+01 | <U+2581><U+2583><U+2587><U+2587><U+2581> |
| longitude | 0 | 1.00 | 1.256000e+01 | 0.03 | 1.245000e+01 | 1.254000e+01 | 1.255000e+01 | 1.258000e+01 | 1.264000e+01 | <U+2581><U+2582><U+2587><U+2585><U+2582> |
| accommodates | 0 | 1.00 | 3.320000e+00 | 1.63 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 4.000000e+00 | 1.600000e+01 | <U+2587><U+2582><U+2581><U+2581><U+2581> |
| bathrooms | 12 | 1.00 | 1.080000e+00 | 0.28 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| bedrooms | 29 | 1.00 | 1.550000e+00 | 1.06 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 2.000000e+00 | 1.010000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| beds | 103 | 1.00 | 2.040000e+00 | 1.44 | 0.000000e+00 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 2.500000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| square_feet | 28130 | 0.01 | 7.218900e+02 | 576.10 | 0.000000e+00 | 1.200000e+02 | 7.640000e+02 | 1.076000e+03 | 2.799000e+03 | <U+2587><U+2587><U+2583><U+2581><U+2581> |
| price | 0 | 1.00 | 8.491900e+02 | 1066.81 | 0.000000e+00 | 4.980000e+02 | 6.980000e+02 | 9.960000e+02 | 6.917500e+04 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| cleaning_fee | 0 | 1.00 | 2.103900e+02 | 249.69 | 0.000000e+00 | 0.000000e+00 | 1.500000e+02 | 3.000000e+02 | 4.000000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| guests_included | 0 | 1.00 | 1.520000e+00 | 1.06 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 2.000000e+00 | 1.600000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| extra_people | 0 | 1.00 | 6.388000e+01 | 126.23 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.000000e+02 | 2.024000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| minimum_nights | 0 | 1.00 | 3.850000e+00 | 18.07 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 4.000000e+00 | 1.100000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| maximum_nights | 0 | 1.00 | 6.206700e+02 | 553.19 | 1.000000e+00 | 1.500000e+01 | 1.125000e+03 | 1.125000e+03 | 9.999000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| minimum_minimum_nights | 0 | 1.00 | 3.840000e+00 | 18.08 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 4.000000e+00 | 1.100000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| maximum_minimum_nights | 0 | 1.00 | 4.120000e+00 | 19.29 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 4.000000e+00 | 1.100000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| minimum_maximum_nights | 0 | 1.00 | 6.669200e+02 | 548.00 | 1.000000e+00 | 2.000000e+01 | 1.125000e+03 | 1.125000e+03 | 9.999000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| maximum_maximum_nights | 0 | 1.00 | 6.703800e+02 | 547.39 | 1.000000e+00 | 2.000000e+01 | 1.125000e+03 | 1.125000e+03 | 9.999000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| minimum_nights_avg_ntm | 0 | 1.00 | 3.970000e+00 | 18.35 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 4.000000e+00 | 1.100000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| maximum_nights_avg_ntm | 0 | 1.00 | 6.684100e+02 | 547.20 | 1.000000e+00 | 2.000000e+01 | 1.125000e+03 | 1.125000e+03 | 9.999000e+03 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| availability_30 | 0 | 1.00 | 5.870000e+00 | 10.42 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 8.000000e+00 | 3.000000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2582> |
| availability_60 | 0 | 1.00 | 1.124000e+01 | 20.15 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.400000e+01 | 6.000000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2582> |
| availability_90 | 0 | 1.00 | 1.649000e+01 | 29.99 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.800000e+01 | 9.000000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| availability_365 | 0 | 1.00 | 4.947000e+01 | 99.23 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 3.500000e+01 | 3.650000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| number_of_reviews | 0 | 1.00 | 1.357000e+01 | 27.09 | 0.000000e+00 | 1.000000e+00 | 5.000000e+00 | 1.500000e+01 | 6.370000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| number_of_reviews_ltm | 0 | 1.00 | 2.740000e+00 | 6.64 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 3.000000e+00 | 3.760000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| review_scores_rating | 5447 | 0.81 | 9.516000e+01 | 6.83 | 2.000000e+01 | 9.300000e+01 | 9.700000e+01 | 1.000000e+02 | 1.000000e+02 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_accuracy | 5468 | 0.81 | 9.730000e+00 | 0.66 | 2.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_cleanliness | 5466 | 0.81 | 9.380000e+00 | 0.96 | 2.000000e+00 | 9.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_checkin | 5488 | 0.81 | 9.820000e+00 | 0.56 | 2.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_communication | 5470 | 0.81 | 9.860000e+00 | 0.53 | 2.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_location | 5491 | 0.81 | 9.600000e+00 | 0.68 | 2.000000e+00 | 9.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| review_scores_value | 5495 | 0.81 | 9.450000e+00 | 0.76 | 2.000000e+00 | 9.000000e+00 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| calculated_host_listings_count | 0 | 1.00 | 4.450000e+00 | 28.06 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 2.810000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| calculated_host_listings_count_entire_homes | 0 | 1.00 | 4.170000e+00 | 28.08 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 2.810000e+02 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| calculated_host_listings_count_private_rooms | 0 | 1.00 | 2.600000e-01 | 0.73 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.200000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| calculated_host_listings_count_shared_rooms | 0 | 1.00 | 1.000000e-02 | 0.33 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.300000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| reviews_per_month | 4968 | 0.83 | 4.900000e-01 | 0.73 | 1.000000e-02 | 1.200000e-01 | 2.800000e-01 | 5.900000e-01 | 3.065000e+01 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
To only have a list of variables that will be used in the analysis, redundant variables are filtered out. As the analysis focuses on short holiday stays, the long stay listings need to be removed from the list. By looking at the minimum nights on the listings, it can be determined if these addresses are viable for short stays.
| minimum_nights | count |
|---|---|
| 2 | 8500 |
| 3 | 6479 |
| 1 | 5444 |
| 4 | 3117 |
| 5 | 2346 |
| 7 | 997 |
| 6 | 702 |
| 14 | 198 |
| 10 | 170 |
| 30 | 136 |
| 8 | 52 |
| 20 | 49 |
| 15 | 35 |
| 21 | 33 |
| 60 | 23 |
| 9 | 22 |
| 12 | 21 |
| 90 | 21 |
| 13 | 20 |
| 25 | 16 |
| 28 | 14 |
| 50 | 14 |
| 100 | 8 |
| 31 | 7 |
| 11 | 6 |
| 23 | 6 |
| 29 | 6 |
| 45 | 5 |
| 16 | 4 |
| 19 | 4 |
| 200 | 4 |
| 1000 | 4 |
| 17 | 3 |
| 40 | 3 |
| 70 | 3 |
| 80 | 3 |
| 120 | 3 |
| 18 | 2 |
| 22 | 2 |
| 24 | 2 |
| 26 | 2 |
| 27 | 2 |
| 49 | 2 |
| 85 | 2 |
| 160 | 2 |
| 180 | 2 |
| 360 | 2 |
| 365 | 2 |
| 999 | 2 |
| 34 | 1 |
| 35 | 1 |
| 39 | 1 |
| 42 | 1 |
| 43 | 1 |
| 48 | 1 |
| 56 | 1 |
| 59 | 1 |
| 61 | 1 |
| 66 | 1 |
| 75 | 1 |
| 89 | 1 |
| 92 | 1 |
| 102 | 1 |
| 107 | 1 |
| 150 | 1 |
| 270 | 1 |
| 300 | 1 |
| 430 | 1 |
| 500 | 1 |
| 1100 | 1 |
In this summary it can be seen that after 5 minimum days the numbers of listing drop significantly.The majority of listings are within the 1 to 4 minimum days stay. Therefore this will give a good indication of the listings intended for short, holiday, stays. In the next section the data is filtered to only feature a minimum of 4 nights or less. Also, the variables that will be used in the analysis are selected, so to only have relevant variables left to make the analysis more concise and clear.
Let’s have a look at the data.
| Name | listings_clean |
| Number of rows | 28523 |
| Number of columns | 20 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| logical | 2 |
| numeric | 13 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| listing_url | 0 | 1 | 33 | 37 | 0 | 28523 | 0 |
| prop_type_simplified | 0 | 1 | 5 | 11 | 0 | 5 | 0 |
| room_type | 0 | 1 | 10 | 15 | 0 | 4 | 0 |
| neighbourhood_cleansed | 0 | 1 | 5 | 25 | 0 | 11 | 0 |
| cancellation_policy | 0 | 1 | 8 | 27 | 0 | 4 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| host_is_superhost | 11 | 1 | 0.10 | FAL: 25610, TRU: 2902 |
| is_location_exact | 0 | 1 | 0.79 | TRU: 22507, FAL: 6016 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1.00 | 20440633.59 | 12261143.29 | 6983.00 | 10083785.00 | 19502966.00 | 30435871.00 | 43956346.00 | <U+2587><U+2587><U+2587><U+2585><U+2586> |
| price | 0 | 1.00 | 849.19 | 1066.81 | 0.00 | 498.00 | 698.00 | 996.00 | 69175.00 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| guests_included | 0 | 1.00 | 1.52 | 1.06 | 1.00 | 1.00 | 1.00 | 2.00 | 16.00 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| cleaning_fee | 0 | 1.00 | 210.39 | 249.69 | 0.00 | 0.00 | 150.00 | 300.00 | 4000.00 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| extra_people | 0 | 1.00 | 63.88 | 126.23 | 0.00 | 0.00 | 0.00 | 100.00 | 2024.00 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| number_of_reviews | 0 | 1.00 | 13.57 | 27.09 | 0.00 | 1.00 | 5.00 | 15.00 | 637.00 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| review_scores_rating | 5447 | 0.81 | 95.16 | 6.83 | 20.00 | 93.00 | 97.00 | 100.00 | 100.00 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| bathrooms | 12 | 1.00 | 1.08 | 0.28 | 0.00 | 1.00 | 1.00 | 1.00 | 10.00 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| bedrooms | 29 | 1.00 | 1.55 | 1.06 | 0.00 | 1.00 | 1.00 | 2.00 | 101.00 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| beds | 103 | 1.00 | 2.04 | 1.44 | 0.00 | 1.00 | 2.00 | 3.00 | 25.00 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
| accommodates | 0 | 1.00 | 3.32 | 1.63 | 1.00 | 2.00 | 3.00 | 4.00 | 16.00 | <U+2587><U+2582><U+2581><U+2581><U+2581> |
| longitude | 0 | 1.00 | 12.56 | 0.03 | 12.45 | 12.54 | 12.55 | 12.58 | 12.64 | <U+2581><U+2582><U+2587><U+2585><U+2582> |
| latitude | 0 | 1.00 | 55.68 | 0.02 | 55.62 | 55.67 | 55.68 | 55.70 | 55.73 | <U+2581><U+2583><U+2587><U+2587><U+2581> |
Now that the data is clean, a selection of useful data has been made, and filtered to only feature our criteria for a short stay holiday listing, visualisations of data can be made. This lays the foundation of any correlations and questions the regression model will have to explain and answer.
In order to get an idea on the concentration of apartment listings in the area of Copenhagen, we plotted a map showing the number of flat available for rent for 4 nights or less. We used clusters instead of point to make the map more readable and to get a better idea of the number of listings in each area.
The map shows the number of available flats are located in the city center. The more we get away from the center the fewer the number of flats available.To gauge the price difference in each area in Copenhagen, we plotted a heatmap showing the more expensive areas in a brighter red colour. Also we plotted a bar chart to get an idea of the median price for each area. The reason for using median over mean is because the listings in Copenhagen have very extreme values and they skew the mean too much for certain areas.
We can conclude that the harbour area in Copehagen has the most expensive listings. Therefore we suspect that location will have a high coefficient when we will be running te regression models.
The Indre By area in Copenhagen is the central harbour area and is the most expensive area for AirBnB prices. This confirms that the neighborhood variable will have a significant impact on the price variable.
Next up, we have plotted a number of bar charts to display the price relationship with the types of accommodation.
Median price per night per property type: The histogram showed that among all the property types, the lowest median price per night is allocated to the appartments
Median price per room type: The histogram showed that hotel rooms are generally more expensive than the other types of rooms available.


From this we can conclude that both room type and property type have an influence on the price.
Following from this, we wanted to know what influence rating has on the price. For this we used both data on the review scores and whether the listing is posted by a superhost.
## Warning: Removed 2 rows containing non-finite values (stat_density).

## Warning: Removed 5442 rows containing non-finite values (stat_bin).

## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 5448 rows containing non-finite values (stat_smooth).
## Warning: Removed 5447 rows containing missing values (geom_point).
From this we can conclude that being a superhost does have an influence on the price. People would prefer a superhost, so it seems logical that the average price for a superhost is higher than a listing under a normal host. The ratings however don’t seem to be related to being a superhost. In fact the perfect scores are more among non-superhosts. This could be because of the higher expectations set by booking via a superhost, this could make guests more critical when writing a review. finally, there does not seem to be a strong correlation with a rating score and the price. Hosts will not alter their prices to their review scores.
To display a comprehensive overview of the correlation between variables and the price we plotted a GGpairs plot. Using all the selected variables.
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5447 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 29 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 12 rows containing missing values
## Warning: Removed 5447 rows containing missing values (geom_point).
## Warning: Removed 5447 rows containing non-finite values (stat_density).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5458 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5454 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5447 rows containing missing values
## Warning: Removed 29 rows containing missing values (geom_point).
## Warning: Removed 5458 rows containing missing values (geom_point).
## Warning: Removed 29 rows containing non-finite values (stat_density).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 38 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 29 rows containing missing values
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 5454 rows containing missing values (geom_point).
## Warning: Removed 38 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing non-finite values (stat_density).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 12 rows containing missing values
## Warning: Removed 5447 rows containing missing values (geom_point).
## Warning: Removed 29 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing missing values (geom_point).

Before we start with the regression modeling, we need to filter the data to represent 2 people staying for 4 nights, including the cleaning fee.
The following plots show the density for the total price for 4 nights wjth and without a log function.
# Density plots for price_4_nights
density.default(listings_new$price_4_nights)
##
## Call:
## density.default(x = listings_new$price_4_nights)
##
## Data: listings_new$price_4_nights (28523 obs.); Bandwidth 'bw' = 173
##
## x y
## Min. : -518.9 Min. :0.000e+00
## 1st Qu.: 68915.5 1st Qu.:0.000e+00
## Median :138350.0 Median :0.000e+00
## Mean :138350.0 Mean :4.589e-06
## 3rd Qu.:207784.5 3rd Qu.:2.000e-10
## Max. :277218.9 Max. :3.565e-04
# Calculate the most frequently occurring price (mode) for 4 nights
max_p4n <- density(listings_new$price_4_nights)$x[which.max(density(listings_new$price_4_nights)$y)]
max_p4n
## [1] 2742.17
# Plot density of price_4_nights
ggplot(
listings_new,
aes(x = price_4_nights)) +
geom_density() +
# Plot vertical line to show maximum value
geom_vline(xintercept = max_p4n) +
theme_classic() +
NULL

# Plot density of log(price_4_nights)
ggplot(listings_new, aes(x = log(price_4_nights))) +
geom_density() +
theme_classic() +
NULL
## Warning: Removed 1 rows containing non-finite values (stat_density).
we should use variable log(price_4_nights). Looking at the density plots of
price_4_nights and log(price_4_nights), we can see clearly that log(price_4_nights) is more close to a normal distribution, while most values in price_4_nights are cramped within a very small range.
Fitting a regression model called model1 with the following explanatory variables: prop_type_simplified, number_of_reviews, and review_scores_rating
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 3.24 | 0.0198 | 164 | 0 | 3.21 | 3.28 |
| prop_type_simplifiedCondominium | 0.033 | 0.00576 | 5.74 | 9.66e-09 | 0.0218 | 0.0443 |
| prop_type_simplifiedHouse | 0.119 | 0.00693 | 17.1 | 4.35e-65 | 0.105 | 0.132 |
| prop_type_simplifiedOther | 0.178 | 0.00872 | 20.4 | 1.7e-91 | 0.161 | 0.195 |
| prop_type_simplifiedTownhouse | 0.177 | 0.0102 | 17.4 | 2.2e-67 | 0.157 | 0.197 |
| number_of_reviews | 0.000188 | 4.84e-05 | 3.89 | 0.000101 | 9.34e-05 | 0.000283 |
| review_scores_rating | 0.00235 | 0.000207 | 11.3 | 9.89e-30 | 0.00194 | 0.00276 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.046 | 0.0457 | 0.215 | 185 | 3.13e-231 | 6 | 2.77e+03 | -5.53e+03 | -5.46e+03 | 1.06e+03 | 23069 | 23076 |
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

Ratings (review_scores_rating) is a significant predictor. Controlling for other variables, every 1 point increase in rating is associated with $ 0.002 increase in log10(price_4_nights).
Interpretation of prop_type_simplified predictor:
Property type (prop_type_simplified) is a significant predictor. Controlling for other variables, switching property type from apartment to condomunium would result in 0.034 increase in log10(price_4_nights). Similarly, switching property type from apartment to house would result in 0.11 increase in log10(price_4_nights); switching property type from apartment to townhouse would result in 0.17 increase in log10(price_4_nights); switching property type from apartment to property types other than the aforementioned 3 types would result in an average of 0.18 increase in log10(price_4_nights).
Model 1: 4%
We want to determine if room_type is a significant predictor of the cost for 4 nights, given everything else in the model.
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 3.36 | 0.0177 | 189 | 0 | 3.32 | 3.39 |
| prop_type_simplifiedCondominium | 0.0168 | 0.00513 | 3.27 | 0.00107 | 0.00674 | 0.0269 |
| prop_type_simplifiedHouse | 0.126 | 0.00618 | 20.4 | 1.8e-91 | 0.114 | 0.138 |
| prop_type_simplifiedOther | 0.168 | 0.00797 | 21.1 | 1.14e-97 | 0.152 | 0.184 |
| prop_type_simplifiedTownhouse | 0.166 | 0.00905 | 18.3 | 1.98e-74 | 0.148 | 0.184 |
| number_of_reviews | 0.000668 | 4.36e-05 | 15.3 | 1.33e-52 | 0.000582 | 0.000753 |
| review_scores_rating | 0.00154 | 0.000185 | 8.33 | 8.6e-17 | 0.00118 | 0.00191 |
| room_typeHotel room | 0.112 | 0.0342 | 3.28 | 0.00104 | 0.0452 | 0.179 |
| room_typePrivate room | -0.271 | 0.00355 | -76.3 | 0 | -0.278 | -0.264 |
| room_typeShared room | -0.362 | 0.0246 | -14.7 | 5.51e-49 | -0.41 | -0.314 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.243 | 0.243 | 0.191 | 823 | 0 | 9 | 5.44e+03 | -1.09e+04 | -1.08e+04 | 843 | 23066 | 23076 |
Interpretation of room_type predictor:
The result of model2 regression shows that rooms_type is a significant predictor. Controlling for other variables, switching room type from entire house to hotel room would result in 0.11 increase in log10(price_4_nights). Nevertheless, switching from entire house to private room would result in 0.27 drop in log10(price_4_nights); switching from entire house to shared room would result in 0.38 drop in log10(price_4_nights).
Model 2: 24%
Are the number of bathrooms, bedrooms, beds, or size of the house (accomodates) significant predictors of price_4_nights?
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 3.08 | 0.0163 | 189 | 0 | 3.05 | 3.12 |
| prop_type_simplifiedCondominium | 0.0105 | 0.00452 | 2.31 | 0.0207 | 0.0016 | 0.0193 |
| prop_type_simplifiedHouse | -0.0476 | 0.00594 | -8.02 | 1.08e-15 | -0.0593 | -0.036 |
| prop_type_simplifiedOther | 0.0906 | 0.00709 | 12.8 | 2.97e-37 | 0.0767 | 0.105 |
| prop_type_simplifiedTownhouse | 0.00279 | 0.00835 | 0.334 | 0.738 | -0.0136 | 0.0191 |
| number_of_reviews | 0.000596 | 3.85e-05 | 15.5 | 7.78e-54 | 0.000521 | 0.000672 |
| review_scores_rating | 0.00166 | 0.000164 | 10.2 | 3.24e-24 | 0.00134 | 0.00199 |
| room_typeHotel room | 0.223 | 0.0302 | 7.38 | 1.65e-13 | 0.164 | 0.282 |
| room_typePrivate room | -0.18 | 0.00338 | -53.2 | 0 | -0.187 | -0.174 |
| room_typeShared room | -0.261 | 0.0217 | -12 | 3e-33 | -0.303 | -0.218 |
| bathrooms | 0.0581 | 0.0047 | 12.4 | 4.54e-35 | 0.0489 | 0.0673 |
| bedrooms | 0.0501 | 0.00224 | 22.3 | 1.83e-109 | 0.0457 | 0.0545 |
| beds | -0.00273 | 0.00125 | -2.19 | 0.0283 | -0.00517 | -0.00029 |
| accommodates | 0.0389 | 0.00131 | 29.6 | 2.69e-189 | 0.0363 | 0.0415 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.412 | 0.412 | 0.168 | 1.24e+03 | 0 | 13 | 8.36e+03 | -1.67e+04 | -1.66e+04 | 653 | 23021 | 23035 |
## GVIF Df GVIF^(1/(2*Df))
## prop_type_simplified 1.373951 4 1.040510
## number_of_reviews 1.029840 1 1.014810
## review_scores_rating 1.012554 1 1.006257
## room_type 1.272593 3 1.040994
## bathrooms 1.332643 1 1.154401
## bedrooms 3.071499 1 1.752569
## beds 2.550304 1 1.596967
## accommodates 3.628682 1 1.904910

The number of bathrooms, bedrooms and size of the house (accommodates) are significant predictors of log10(price_4_nights) and are all positively related with price_4_nights, while number of beds is not a significant predictor. Coefficients show that each additional bathroom is associated with 0.06 increase in log10(price_4_nights); each additional bedroom is associated with 0.05 increase in log10(price_4_nights). There might be a problem with multi-collinearity since intuitively the number of bathrooms and bedrooms should have a positive relationship with size of the house. A further analysis on VIF shows that size of the house has slight correlation with other predictors but is within an acceptable range.
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 3.08 | 0.0163 | 189 | 0 | 3.05 | 3.12 |
| prop_type_simplifiedCondominium | 0.0106 | 0.00452 | 2.34 | 0.0195 | 0.0017 | 0.0194 |
| prop_type_simplifiedHouse | -0.0481 | 0.00593 | -8.11 | 5.24e-16 | -0.0598 | -0.0365 |
| prop_type_simplifiedOther | 0.0905 | 0.00709 | 12.7 | 4.17e-37 | 0.0766 | 0.104 |
| prop_type_simplifiedTownhouse | 0.00227 | 0.00834 | 0.273 | 0.785 | -0.0141 | 0.0186 |
| number_of_reviews | 0.000598 | 3.85e-05 | 15.5 | 3.5e-54 | 0.000523 | 0.000674 |
| review_scores_rating | 0.00167 | 0.000164 | 10.2 | 1.7e-24 | 0.00135 | 0.00199 |
| room_typeHotel room | 0.223 | 0.0302 | 7.39 | 1.58e-13 | 0.164 | 0.282 |
| room_typePrivate room | -0.181 | 0.00338 | -53.5 | 0 | -0.187 | -0.174 |
| room_typeShared room | -0.262 | 0.0217 | -12.1 | 1.91e-33 | -0.304 | -0.219 |
| bathrooms | 0.0581 | 0.0047 | 12.4 | 4.75e-35 | 0.0489 | 0.0673 |
| bedrooms | 0.0488 | 0.00216 | 22.6 | 1.72e-111 | 0.0446 | 0.0531 |
| accommodates | 0.0376 | 0.00118 | 31.9 | 1.11e-218 | 0.0353 | 0.0399 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.413 | 0.412 | 0.168 | 1.35e+03 | 0 | 12 | 8.36e+03 | -1.67e+04 | -1.66e+04 | 654 | 23047 | 23060 |
## GVIF Df GVIF^(1/(2*Df))
## prop_type_simplified 1.370590 4 1.040192
## number_of_reviews 1.029562 1 1.014673
## review_scores_rating 1.011882 1 1.005924
## room_type 1.271896 3 1.040899
## bathrooms 1.332684 1 1.154419
## bedrooms 2.862720 1 1.691957
## accommodates 2.929976 1 1.711717

Model 3: 41%
Do superhosts command a pricing premium, after controlling for other variables?
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 3.1 | 0.0164 | 189 | 0 | 3.07 | 3.13 |
| prop_type_simplifiedCondominium | 0.00988 | 0.00452 | 2.19 | 0.0287 | 0.00103 | 0.0187 |
| prop_type_simplifiedHouse | -0.0478 | 0.00592 | -8.07 | 7.47e-16 | -0.0594 | -0.0362 |
| prop_type_simplifiedOther | 0.089 | 0.00708 | 12.6 | 4.07e-36 | 0.0751 | 0.103 |
| prop_type_simplifiedTownhouse | 0.00141 | 0.00833 | 0.17 | 0.865 | -0.0149 | 0.0177 |
| number_of_reviews | 0.000484 | 4.01e-05 | 12.1 | 2.24e-33 | 0.000405 | 0.000562 |
| review_scores_rating | 0.00145 | 0.000165 | 8.81 | 1.28e-18 | 0.00113 | 0.00178 |
| room_typeHotel room | 0.226 | 0.0301 | 7.49 | 6.91e-14 | 0.167 | 0.285 |
| room_typePrivate room | -0.183 | 0.00338 | -54.1 | 0 | -0.189 | -0.176 |
| room_typeShared room | -0.261 | 0.0216 | -12.1 | 2.06e-33 | -0.304 | -0.219 |
| bathrooms | 0.0582 | 0.00469 | 12.4 | 3.14e-35 | 0.049 | 0.0673 |
| bedrooms | 0.0491 | 0.00216 | 22.7 | 2.89e-113 | 0.0449 | 0.0534 |
| accommodates | 0.0374 | 0.00118 | 31.8 | 9.41e-217 | 0.0351 | 0.0397 |
| host_is_superhostTRUE | 0.0354 | 0.00358 | 9.88 | 5.54e-23 | 0.0284 | 0.0424 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.415 | 0.415 | 0.168 | 1.26e+03 | 0 | 13 | 8.4e+03 | -1.68e+04 | -1.67e+04 | 651 | 23040 | 23054 |
## GVIF Df GVIF^(1/(2*Df))
## prop_type_simplified 1.371707 4 1.040298
## number_of_reviews 1.122623 1 1.059539
## review_scores_rating 1.028858 1 1.014326
## room_type 1.277660 3 1.041684
## bathrooms 1.332709 1 1.154430
## bedrooms 2.863006 1 1.692042
## accommodates 2.930492 1 1.711868
## host_is_superhost 1.122152 1 1.059317

Interpretation of superhost predictor:
Since the p-value for the estimate of the superhost coefficient is significant (with a p-value of 0.36 and a t-statistic of 0.9), it seems that superhosts do command a price premium from their guests in Copenhagen. Compared to non-superhost, being a superhost is associated with 0.04 increase in log10(price_4_nights).
Since the p-value for the estimate of the superhost coefficient is significant (with a p-value of 0.36 and a t-statistic of 0.9), it seems that superhosts do command a price premium from their guests in Copenhagen. Compared to non-superhost, being a superhost is associated with 0.03 increase in log10(price_4_nights).
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 3.11 | 0.0165 | 188 | 0 | 3.07 | 3.14 |
| prop_type_simplifiedCondominium | 0.00979 | 0.00452 | 2.17 | 0.0301 | 0.000945 | 0.0186 |
| prop_type_simplifiedHouse | -0.0477 | 0.00592 | -8.06 | 8.12e-16 | -0.0593 | -0.0361 |
| prop_type_simplifiedOther | 0.0891 | 0.00708 | 12.6 | 3.59e-36 | 0.0752 | 0.103 |
| prop_type_simplifiedTownhouse | 0.0015 | 0.00833 | 0.18 | 0.857 | -0.0148 | 0.0178 |
| number_of_reviews | 0.000489 | 4.02e-05 | 12.2 | 5.72e-34 | 0.00041 | 0.000568 |
| review_scores_rating | 0.00141 | 0.000166 | 8.45 | 3.12e-17 | 0.00108 | 0.00173 |
| room_typeHotel room | 0.228 | 0.0301 | 7.55 | 4.39e-14 | 0.169 | 0.287 |
| room_typePrivate room | -0.183 | 0.00338 | -54.1 | 0 | -0.19 | -0.176 |
| room_typeShared room | -0.261 | 0.0216 | -12.1 | 1.78e-33 | -0.304 | -0.219 |
| bathrooms | 0.058 | 0.00469 | 12.4 | 5.16e-35 | 0.0488 | 0.0672 |
| bedrooms | 0.0491 | 0.00216 | 22.7 | 2.95e-113 | 0.0449 | 0.0534 |
| accommodates | 0.0374 | 0.00118 | 31.8 | 5.54e-217 | 0.0351 | 0.0397 |
| host_is_superhostTRUE | -0.242 | 0.122 | -1.99 | 0.0471 | -0.481 | -0.00306 |
| review_scores_rating:host_is_superhostTRUE | 0.00285 | 0.00125 | 2.28 | 0.0228 | 0.000396 | 0.0053 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.415 | 0.415 | 0.168 | 1.17e+03 | 0 | 14 | 8.41e+03 | -1.68e+04 | -1.67e+04 | 651 | 23039 | 23054 |
## GVIF Df GVIF^(1/(2*Df))
## prop_type_simplified 1.371877 4 1.040314
## number_of_reviews 1.126391 1 1.061316
## review_scores_rating 1.046224 1 1.022851
## room_type 1.278764 3 1.041834
## bathrooms 1.333108 1 1.154603
## bedrooms 2.863011 1 1.692043
## accommodates 2.930623 1 1.711906
## host_is_superhost 1301.822960 1 36.080784
## review_scores_rating:host_is_superhost 1301.627193 1 36.078071

Interaction variable host_is_superhost*review_scores_rating is not significant, therefore we do not include it.
Model 4: 41%
Most owners advertise the exact location of their listing (is_location_exact == TRUE), while a non-trivial proportion don’t. After controlling for other variables, is a listing’s exact location a significant predictor of price_4_nights?
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 3.11 | 0.0165 | 188 | 0 | 3.07 | 3.14 |
| prop_type_simplifiedCondominium | 0.00975 | 0.00452 | 2.16 | 0.0308 | 0.000903 | 0.0186 |
| prop_type_simplifiedHouse | -0.0478 | 0.00592 | -8.07 | 7.27e-16 | -0.0594 | -0.0362 |
| prop_type_simplifiedOther | 0.0884 | 0.00709 | 12.5 | 1.36e-35 | 0.0745 | 0.102 |
| prop_type_simplifiedTownhouse | 0.00103 | 0.00833 | 0.123 | 0.902 | -0.0153 | 0.0174 |
| number_of_reviews | 0.000485 | 4.01e-05 | 12.1 | 1.71e-33 | 0.000406 | 0.000563 |
| review_scores_rating | 0.00146 | 0.000165 | 8.85 | 9.22e-19 | 0.00114 | 0.00178 |
| room_typeHotel room | 0.227 | 0.0301 | 7.52 | 5.6e-14 | 0.168 | 0.286 |
| room_typePrivate room | -0.183 | 0.00338 | -54.1 | 0 | -0.19 | -0.176 |
| room_typeShared room | -0.261 | 0.0216 | -12.1 | 2.35e-33 | -0.303 | -0.218 |
| bathrooms | 0.0582 | 0.00469 | 12.4 | 2.72e-35 | 0.049 | 0.0674 |
| bedrooms | 0.0491 | 0.00216 | 22.7 | 2.79e-113 | 0.0449 | 0.0534 |
| accommodates | 0.0374 | 0.00118 | 31.8 | 3.56e-217 | 0.0351 | 0.0397 |
| host_is_superhostTRUE | 0.0356 | 0.00358 | 9.95 | 2.93e-23 | 0.0286 | 0.0426 |
| is_location_exactTRUE | -0.00547 | 0.00277 | -1.97 | 0.0484 | -0.0109 | -3.73e-05 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.415 | 0.415 | 0.168 | 1.17e+03 | 0 | 14 | 8.41e+03 | -1.68e+04 | -1.67e+04 | 651 | 23039 | 23054 |
## GVIF Df GVIF^(1/(2*Df))
## prop_type_simplified 1.375094 4 1.040619
## number_of_reviews 1.122768 1 1.059607
## review_scores_rating 1.029247 1 1.014518
## room_type 1.278044 3 1.041736
## bathrooms 1.332750 1 1.154448
## bedrooms 2.863006 1 1.692042
## accommodates 2.931407 1 1.712135
## host_is_superhost 1.123547 1 1.059975
## is_location_exact 1.005398 1 1.002695

A listing’s exact location variable has a p value of 0.34, and therefore is not a significant predictor of price_4_nights. Thus, we will not include it in the following models.
Model 5: 41%
In order to reduce the number of neighbourhoods for our analysis we cluster the different neighbourhoods into groups, based on our experience, talking to locals, and research.
The city of Copenhagen has 10 official administrative districts. Additionally, there is Frederiksberg, which is an independent municipality and, thus, separate from the Copenhagen Municipality, however, it is still part of Copenhagen city.
Indre By is the city center of Copenhagen and will remain its own cluster “Center”.
All the residential neighbourhoods surrounding the city center are commonly grouped together, and since they all end in -bro, often referred to as -bro districts (Brokvaterer in Danish). Frederiksberg is often also included in the -bro districts. Hence, we will cluster these 4 Brokvaterer districts into one group of Brokvarterer
Additionally, the two Amager districts (Vest and Ost) are grouped together.
In the West, Valby, Vanlose and Brønshøj-Husum are summarized.
Lastly, Bispebjerg, often referred to as Nordvest, which is more of a residential neighbourhood, is its own cluster.
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 3.1 | 0.0154 | 202 | 0 | 3.07 | 3.13 |
| prop_type_simplifiedCondominium | 0.0143 | 0.00418 | 3.42 | 0.000622 | 0.00611 | 0.0225 |
| prop_type_simplifiedHouse | 0.0189 | 0.0057 | 3.32 | 0.000915 | 0.00773 | 0.0301 |
| prop_type_simplifiedOther | 0.091 | 0.00657 | 13.9 | 1.83e-43 | 0.0781 | 0.104 |
| prop_type_simplifiedTownhouse | 0.0141 | 0.00775 | 1.82 | 0.0689 | -0.00109 | 0.0293 |
| number_of_reviews | 0.000231 | 3.73e-05 | 6.2 | 5.92e-10 | 0.000158 | 0.000304 |
| review_scores_rating | 0.00132 | 0.000153 | 8.61 | 7.6e-18 | 0.00102 | 0.00162 |
| room_typeHotel room | 0.139 | 0.0279 | 4.97 | 6.9e-07 | 0.0839 | 0.193 |
| room_typePrivate room | -0.175 | 0.00313 | -55.9 | 0 | -0.181 | -0.169 |
| room_typeShared room | -0.246 | 0.02 | -12.3 | 1.46e-34 | -0.285 | -0.207 |
| bathrooms | 0.0541 | 0.00434 | 12.5 | 1.51e-35 | 0.0456 | 0.0626 |
| bedrooms | 0.0466 | 0.002 | 23.3 | 1.26e-118 | 0.0427 | 0.0505 |
| accommodates | 0.0364 | 0.00109 | 33.4 | 1.2e-238 | 0.0343 | 0.0385 |
| neighbourhood_simplifiedBrokvarterer | 0.0274 | 0.00295 | 9.29 | 1.63e-20 | 0.0216 | 0.0331 |
| neighbourhood_simplifiedCenter | 0.161 | 0.00383 | 42.1 | 0 | 0.154 | 0.169 |
| neighbourhood_simplifiedNordvest | -0.0911 | 0.00451 | -20.2 | 3.76e-90 | -0.0999 | -0.0823 |
| neighbourhood_simplifiedWest | -0.0836 | 0.005 | -16.7 | 1.71e-62 | -0.0934 | -0.0738 |
| host_is_superhostTRUE | 0.0311 | 0.00331 | 9.4 | 5.85e-21 | 0.0246 | 0.0376 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.5 | 0.499 | 0.155 | 1.35e+03 | 0 | 17 | 1.02e+04 | -2.04e+04 | -2.02e+04 | 557 | 23036 | 23054 |
## GVIF Df GVIF^(1/(2*Df))
## prop_type_simplified 1.501568 4 1.052127
## number_of_reviews 1.136203 1 1.065928
## review_scores_rating 1.030892 1 1.015329
## room_type 1.286594 3 1.042894
## bathrooms 1.333482 1 1.154765
## bedrooms 2.867499 1 1.693369
## accommodates 2.937274 1 1.713848
## neighbourhood_simplified 1.151978 4 1.017842
## host_is_superhost 1.123124 1 1.059775

One can see that the neighbourhood of an apartment is an predictor of the price for 4 nights in Copenhagen.
Interpretation of neighbourhood predictor:
We can see that all the neighbourhood variables are significant predictors of the price for 4 nights in Copenhagen. Taking Amager as the base, Brokvarterer is asscociated with 0.03 more in log10(price_4_nights); Center is asscociated with 0.16 more in log10(price_4_nights); Nordvest is asscociated with 0.09 drop in log10(price_4_nights); West is asscociated with 0.09 drop in log10(price_4_nights).
Model 6: 50%
What is the effect of cancellation_policy on price_4_nights, after we control for other variables?
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 3.09 | 0.0153 | 202 | 0 | 3.06 | 3.12 |
| prop_type_simplifiedCondominium | 0.0124 | 0.00415 | 2.99 | 0.00284 | 0.00425 | 0.0205 |
| prop_type_simplifiedHouse | 0.0189 | 0.00566 | 3.35 | 0.000822 | 0.00784 | 0.03 |
| prop_type_simplifiedOther | 0.0734 | 0.00712 | 10.3 | 6.64e-25 | 0.0595 | 0.0874 |
| prop_type_simplifiedTownhouse | 0.0125 | 0.0077 | 1.63 | 0.104 | -0.00257 | 0.0276 |
| number_of_reviews | 0.00017 | 3.73e-05 | 4.57 | 4.99e-06 | 9.73e-05 | 0.000244 |
| review_scores_rating | 0.00131 | 0.000152 | 8.61 | 7.72e-18 | 0.00101 | 0.00161 |
| room_typeHotel room | 0.175 | 0.0279 | 6.29 | 3.18e-10 | 0.121 | 0.23 |
| room_typePrivate room | -0.17 | 0.00312 | -54.7 | 0 | -0.177 | -0.164 |
| room_typeShared room | -0.244 | 0.0199 | -12.3 | 1.19e-34 | -0.283 | -0.206 |
| bathrooms | 0.0532 | 0.00431 | 12.3 | 7.16e-35 | 0.0447 | 0.0616 |
| bedrooms | 0.0455 | 0.00199 | 22.9 | 1.16e-114 | 0.0416 | 0.0494 |
| accommodates | 0.0359 | 0.00108 | 33.2 | 1.05e-235 | 0.0338 | 0.038 |
| neighbourhood_simplifiedBrokvarterer | 0.0266 | 0.00292 | 9.11 | 9.1e-20 | 0.0209 | 0.0324 |
| neighbourhood_simplifiedCenter | 0.158 | 0.00381 | 41.5 | 0 | 0.151 | 0.166 |
| neighbourhood_simplifiedNordvest | -0.0898 | 0.00447 | -20.1 | 5.59e-89 | -0.0986 | -0.0811 |
| neighbourhood_simplifiedWest | -0.0805 | 0.00496 | -16.2 | 6.35e-59 | -0.0903 | -0.0708 |
| cancellation_policymoderate | 0.0269 | 0.00241 | 11.2 | 7.93e-29 | 0.0222 | 0.0316 |
| cancellation_policystrict_14_with_grace_period | 0.0457 | 0.00262 | 17.4 | 9.54e-68 | 0.0405 | 0.0508 |
| cancellation_policysuper_strict_60 | 0.127 | 0.0173 | 7.36 | 1.88e-13 | 0.0935 | 0.161 |
| host_is_superhostTRUE | 0.0294 | 0.00329 | 8.94 | 4.11e-19 | 0.023 | 0.0359 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.507 | 0.507 | 0.154 | 1.19e+03 | 0 | 20 | 1.04e+04 | -2.07e+04 | -2.05e+04 | 548 | 23033 | 23054 |
## GVIF Df GVIF^(1/(2*Df))
## prop_type_simplified 1.793107 4 1.075724
## number_of_reviews 1.153751 1 1.074128
## review_scores_rating 1.037877 1 1.018762
## room_type 1.312570 3 1.046374
## bathrooms 1.335585 1 1.155675
## bedrooms 2.871572 1 1.694571
## accommodates 2.943557 1 1.715680
## neighbourhood_simplified 1.161450 4 1.018885
## cancellation_policy 1.268876 3 1.040487
## host_is_superhost 1.124756 1 1.060545

The cancellation policy is a significant predictor on price, with a flexible policy commanding the lowest price and with a strict policy commanding the highest price. This is surprising since usually hosts should be rewarded for the risk that they are taking with a flexible cancellation policy. On the other hand, usually expensive and nice places have a stricter cancellation policy since the cost of having the place empty for a night is relatively higher. Thus, in the end the cancellation policy might rather be a consequence of the price and not a predictor of it.
Model 7: 51%
After creating all these models, we now find to find our best model, our final model for predictions.
| names | model1 | model2 | model3 | model4 | model5 | model6 | model7 | |
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
||
| 1 | (Intercept) | 3.24392507415685 | 3.35597346847575 | 3.08347857081727 | 3.10248901379474 | 3.10605946777837 | 3.10196665923745 | 3.08645813658736 |
| 2 | (0.019785504271002) | (0.0177105018892755) | (0.016276208523874) | (0.0163893980412307) | (0.0164879271515536) | (0.0153690108711702) | (0.0153018352265915) | |
| 3 | prop_type_simplifiedCondominium | 0.0330445977032774 | 0.0168016350317635 | 0.0105699548454743 | 0.00987868517014782 | 0.00975345382308558 | 0.0143000030955854 | 0.0123851550009953 |
| 4 | (0.00575813618828062) | (0.00513368778722317) | (0.00452377311003556) | (0.00451528451130815) | (0.00451544677657339) | (0.00417845159775206) | (0.00414897595879142) | |
| 5 | prop_type_simplifiedHouse | 0.118500630947505 | 0.125853472231933 | -0.048122795979808 | -0.0477713331755052 | -0.0477875430095159 | 0.0188972706992627 | 0.0189301706164228 |
| 6 | (0.00693343909548727) | (0.00617700475016833) | (0.00593246345892025) | (0.00592078546674152) | (0.00592041924862936) | (0.00569917443777188) | (0.00565789372167747) | |
| 7 | prop_type_simplifiedOther | 0.177630584440358 | 0.167906265738652 | 0.0904593812783645 | 0.0890228067608967 | 0.0884173313350716 | 0.0909957339198253 | 0.0734156433999879 |
| 8 | (0.00871702130046473) | (0.00796730372292408) | (0.00709495502466223) | (0.00708229356288155) | (0.00708849112645184) | (0.00656888883106142) | (0.00711550292038246) | |
| 9 | prop_type_simplifiedTownhouse | 0.176808408361145 | 0.165831841966319 | 0.00227446983316441 | 0.00141494263047994 | 0.00102840247149408 | 0.0141011413539161 | 0.012511462281442 |
| 10 | (0.0101610530565199) | (0.00905255104294122) | (0.0083436073381939) | (0.00832741027414538) | (0.00832919038409289) | (0.0077521651472995) | (0.0076953742155565) | |
| 11 | number_of_reviews | 0.000188330201480727 | 0.000667782655285451 | 0.000598193401154432 | 0.000483693533257583 | 0.000484593393274765 | 0.000231256455875061 | 0.000170461188979712 |
| 12 | (4.84422194236793e-05) | (4.36385685344027e-05) | (3.8493828208453e-05) | (4.01164598762268e-05) | (4.01165312963734e-05) | (3.73285994729104e-05) | (3.73320152471178e-05) | |
| 13 | review_scores_rating | 0.00235009643422962 | 0.0015423644383559 | 0.00167274241022833 | 0.00145496933716715 | 0.00146129812121523 | 0.00131618239499806 | 0.00131040246962515 |
| 14 | (0.000207227653216638) | (0.000185179019346473) | (0.000163563398183893) | (0.000165063256670083) | (0.000165084038411034) | (0.000152822530062709) | (0.00015218268306063) | |
| 15 | room_typeHotel room | 0.112285988624123 | 0.22301927896014 | 0.225869771051116 | 0.226710460119289 | 0.13865058802833 | 0.175495746351611 | |
| 16 | (0.0342344822479748) | (0.0301985215196532) | (0.0301395309019716) | (0.0301406480832505) | (0.0279217966010478) | (0.0278891452739544) | ||
| 17 | room_typePrivate room | -0.270789209925544 | -0.180623782503831 | -0.182832632444119 | -0.182882692122035 | -0.175125354598853 | -0.17048232205997 | |
| 18 | (0.0035501721852792) | (0.00337763384528402) | (0.00337940654477376) | (0.00337928946750577) | (0.00313097759430856) | (0.00311832460795075) | ||
| 19 | room_typeShared room | -0.362231963884556 | -0.261799894388423 | -0.261145982744209 | -0.260898244177251 | -0.245936903474211 | -0.244472972454319 | |
| 20 | (0.0245657858841296) | (0.0216896324897146) | (0.0216465638670587) | (0.0216455681370037) | (0.0200246027216212) | (0.0198787022168316) | ||
| 21 | bathrooms | 0.0581148232975267 | 0.0581555336903183 | 0.0582072586519383 | 0.0540608414234523 | 0.0531546451526248 | ||
| 22 | (0.00469701514858695) | (0.00468769789531073) | (0.0046874767102729) | (0.00433704214934734) | (0.00430771960741996) | |||
| 23 | bedrooms | 0.0488418378223484 | 0.0491489259122079 | 0.0491493594027497 | 0.0465903045220969 | 0.0454694460632258 | ||
| 24 | (0.00216494638308704) | (0.00216094267679201) | (0.00216080694777852) | (0.00200028455541194) | (0.00198660539826041) | |||
| 25 | accommodates | 0.0376285344723029 | 0.037391132785231 | 0.0374321814646748 | 0.0363873262506188 | 0.0359206096397687 | ||
| 26 | (0.00117922096396024) | (0.00117713781811812) | (0.00117724763518589) | (0.00109002725321269) | (0.0010829613387494) | |||
| 27 | host_is_superhostTRUE | 0.0353701343498818 | 0.035619157372872 | 0.0311389320195054 | 0.0294131967812591 | |||
| 28 | (0.00357928311454013) | (0.0035812818989754) | (0.00331201322816401) | (0.00328941708379988) | ||||
| 29 | is_location_exactTRUE | -0.00546855444066015 | ||||||
| 30 | (0.00277095811418745) | |||||||
| 31 | neighbourhood_simplifiedBrokvarterer | 0.0273693670752078 | 0.026626715979707 | |||||
| 32 | (0.0029452620587076) | (0.0029236149180782) | ||||||
| 33 | neighbourhood_simplifiedCenter | 0.161454102535627 | 0.15808924854407 | |||||
| 34 | (0.00383407291109753) | (0.00381272000539777) | ||||||
| 35 | neighbourhood_simplifiedNordvest | -0.0911168532312485 | -0.0898351964873918 | |||||
| 36 | (0.00450570657108232) | (0.00447238366852371) | ||||||
| 37 | neighbourhood_simplifiedWest | -0.0835991898153753 | -0.0805274835870511 | |||||
| 38 | (0.00499555945047542) | (0.00496098639188035) | ||||||
| 39 | cancellation_policymoderate | 0.026915441623387 | ||||||
| 40 | (0.00241262985254636) | |||||||
| 41 | cancellation_policystrict_14_with_grace_period | 0.045671774015651 | ||||||
| 42 | (0.00261742648723956) | |||||||
| 43 | cancellation_policysuper_strict_60 | 0.127383509237797 | ||||||
| 44 | (0.0173043275909361) | |||||||
| 1.1 | #observations | 23076 | 23076 | 23060 | 23054 | 23054 | 23054 | 23054 |
| 2.1 | R squared | 0.0459531953111049 | 0.243048634204171 | 0.412790992447907 | 0.415033368314356 | 0.415132241656385 | 0.4996523643904 | 0.507236552653042 |
| 3.1 | Adj. R Squared | 0.0457050579480577 | 0.242753283372117 | 0.412485247314457 | 0.414703309016964 | 0.414776837836046 | 0.499283120172421 | 0.506808676607936 |
| 4.1 | Residual SE | 0.214628235815306 | 0.191189591060171 | 0.168435142562142 | 0.168097971640982 | 0.168087412523062 | 0.15547857615436 | 0.154305765149136 |
Model 7 is the best model with the highest R2 so this will be the final model.
Let’s print the final model (no. 7).
## Estimate Std. Error t value
## (Intercept) 3.086e+00 1.530e-02 201.705
## prop_type_simplifiedCondominium 1.239e-02 4.149e-03 2.985
## prop_type_simplifiedHouse 1.893e-02 5.658e-03 3.346
## prop_type_simplifiedOther 7.342e-02 7.116e-03 10.318
## prop_type_simplifiedTownhouse 1.251e-02 7.695e-03 1.626
## number_of_reviews 1.705e-04 3.733e-05 4.566
## review_scores_rating 1.310e-03 1.522e-04 8.611
## room_typeHotel room 1.755e-01 2.789e-02 6.293
## room_typePrivate room -1.705e-01 3.118e-03 -54.671
## room_typeShared room -2.445e-01 1.988e-02 -12.298
## bathrooms 5.315e-02 4.308e-03 12.339
## bedrooms 4.547e-02 1.987e-03 22.888
## accommodates 3.592e-02 1.083e-03 33.169
## neighbourhood_simplifiedBrokvarterer 2.663e-02 2.924e-03 9.107
## neighbourhood_simplifiedCenter 1.581e-01 3.813e-03 41.464
## neighbourhood_simplifiedNordvest -8.984e-02 4.472e-03 -20.087
## neighbourhood_simplifiedWest -8.053e-02 4.961e-03 -16.232
## cancellation_policymoderate 2.692e-02 2.413e-03 11.156
## cancellation_policystrict_14_with_grace_period 4.567e-02 2.617e-03 17.449
## cancellation_policysuper_strict_60 1.274e-01 1.730e-02 7.361
## host_is_superhostTRUE 2.941e-02 3.289e-03 8.942
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## prop_type_simplifiedCondominium 0.002838 **
## prop_type_simplifiedHouse 0.000822 ***
## prop_type_simplifiedOther < 2e-16 ***
## prop_type_simplifiedTownhouse 0.103997
## number_of_reviews 4.99e-06 ***
## review_scores_rating < 2e-16 ***
## room_typeHotel room 3.18e-10 ***
## room_typePrivate room < 2e-16 ***
## room_typeShared room < 2e-16 ***
## bathrooms < 2e-16 ***
## bedrooms < 2e-16 ***
## accommodates < 2e-16 ***
## neighbourhood_simplifiedBrokvarterer < 2e-16 ***
## neighbourhood_simplifiedCenter < 2e-16 ***
## neighbourhood_simplifiedNordvest < 2e-16 ***
## neighbourhood_simplifiedWest < 2e-16 ***
## cancellation_policymoderate < 2e-16 ***
## cancellation_policystrict_14_with_grace_period < 2e-16 ***
## cancellation_policysuper_strict_60 1.88e-13 ***
## host_is_superhostTRUE < 2e-16 ***
##
## Residual standard error: 0.1543 on 23033 degrees of freedom
## (5469 observations deleted due to missingness)
## Multiple R-squared: 0.5072, Adjusted R-squared: 0.5068
## F-statistic: 1185 on 20 and 23033 DF, p-value: < 2.2e-16
| term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
|---|---|---|---|---|---|---|
| intercept | 3.09 | 0.015 | 202 | 0 | 3.06 | 3.12 |
| prop_type_simplifiedCondominium | 0.012 | 0.004 | 2.98 | 0.003 | 0.004 | 0.021 |
| prop_type_simplifiedHouse | 0.019 | 0.006 | 3.35 | 0.001 | 0.008 | 0.03 |
| prop_type_simplifiedOther | 0.073 | 0.007 | 10.3 | 0 | 0.059 | 0.087 |
| prop_type_simplifiedTownhouse | 0.013 | 0.008 | 1.63 | 0.104 | -0.003 | 0.028 |
| number_of_reviews | 0 | 0 | 4.57 | 0 | 0 | 0 |
| review_scores_rating | 0.001 | 0 | 8.61 | 0 | 0.001 | 0.002 |
| room_typeHotel room | 0.175 | 0.028 | 6.29 | 0 | 0.121 | 0.23 |
| room_typePrivate room | -0.17 | 0.003 | -54.7 | 0 | -0.177 | -0.164 |
| room_typeShared room | -0.244 | 0.02 | -12.3 | 0 | -0.283 | -0.206 |
| bathrooms | 0.053 | 0.004 | 12.3 | 0 | 0.045 | 0.062 |
| bedrooms | 0.045 | 0.002 | 22.9 | 0 | 0.042 | 0.049 |
| accommodates | 0.036 | 0.001 | 33.2 | 0 | 0.034 | 0.038 |
| neighbourhood_simplifiedBrokvarterer | 0.027 | 0.003 | 9.11 | 0 | 0.021 | 0.032 |
| neighbourhood_simplifiedCenter | 0.158 | 0.004 | 41.5 | 0 | 0.151 | 0.166 |
| neighbourhood_simplifiedNordvest | -0.09 | 0.004 | -20.1 | 0 | -0.099 | -0.081 |
| neighbourhood_simplifiedWest | -0.081 | 0.005 | -16.2 | 0 | -0.09 | -0.071 |
| cancellation_policymoderate | 0.027 | 0.002 | 11.2 | 0 | 0.022 | 0.032 |
| cancellation_policystrict_14_with_grace_period | 0.046 | 0.003 | 17.4 | 0 | 0.041 | 0.051 |
| cancellation_policysuper_strict_60 | 0.127 | 0.017 | 7.36 | 0 | 0.093 | 0.161 |
| host_is_superhostTRUE | 0.029 | 0.003 | 8.94 | 0 | 0.023 | 0.036 |
| r_squared | adj_r_squared | mse | rmse | sigma | statistic | p_value | df | nobs |
|---|---|---|---|---|---|---|---|---|
| 0.507 | 0.507 | 0.0238 | 0.154 | 0.154 | 1.19e+03 | 0 | 20 | 2.31e+04 |
## GVIF Df GVIF^(1/(2*Df))
## prop_type_simplified 1.793107 4 1.075724
## number_of_reviews 1.153751 1 1.074128
## review_scores_rating 1.037877 1 1.018762
## room_type 1.312570 3 1.046374
## bathrooms 1.335585 1 1.155675
## bedrooms 2.871572 1 1.694571
## accommodates 2.943557 1 1.715680
## neighbourhood_simplified 1.161450 4 1.018885
## cancellation_policy 1.268876 3 1.040487
## host_is_superhost 1.124756 1 1.060545
Property Type Apartments have the lowest price. Compared to an apartment, a town house has a 1.2% higher price, a house a 1.9% higher price and a condominium a 1.2% higher price.“Other” property types have a 7.6% higher price than apartments, which makes sense since it includes special properties such as house boats etc.
Number of Reviews Already one additional review on AirBnB increases the price of the place 0.02%
Review_scores_rating For every increase in the review score rating, the price rises by 0.13%
Room Type A shared room and private room will have the lowest prices, with 21.69% and 15.6% respectively less than an entire home. This is intuitive since you have to share rooms and/or share facilities, such as bathroom and kitchen with other people. Hotel rooms have the highest price with around 19% more than an entire home.
Bathrooms & Bedrooms The number of bathrooms and bedrooms also have an influence on the price. An additional bathroom will increase the price by 5.4% and an additional bedroom by 4.6%. (both by roughly 5%).
Accommodates Offering an additional accommodate rises the price by 3.6%
Superhost Being a super host can increase your price by 2.7%, so it might be worth it to make to superhost!
Neighbourhoods Looking at the neighbourhoods, we can also see clear trends there. Highest prices are found in the center, which is the case in most cities. There are 17.2% higher than in Amager. Prices in Brokvarterer are 2.7% higher than in Amager. The lowest prices are in Nordvest and the West (8.6% and 7.7% lower than in Amager), which are more residential areas.
Cancellation Policy Places with flexible cancellation policies have the lowest prices, those with moderate ones are 2.7% higher and those with strict ones 4.9%. The super strict ones have the highest prices with 13.5%
In conclusion, this model can help AirBnB hosts understand what drives AirBnB prices, e.g. which in neighborhoods one can demand higher prices, and learn which features customer accept to pay more for .
We are planning to visit Copenhagen and want to stay in an AirBnB. We would like to look at AirBnB’s that are apartments with a private room, have at least 10 reviews, and an average rating of at least 90.
We will predict the total cost to stay at this AirBnB for 4 nights, including the appropriate 95% interval with the prediction.
Next we visualise the data to have a better understanding of the structure.




## [1] 0.01582266
## [1] 0.0316474
| mean_price | median_price |
|---|---|
| 1.98e+03 | 1.9e+03 |
As the price distribution is right skewed, choosing mean as measure to predict the final price wouldn’t be economical so we’ve decided to use the median because it fits our budget.
To conclude, we can predict, with 95% confidence, that a median listing on AirBnB in Copenhagen, for 2 people and 4 nights, will cost between 1,860.18 DKK and 1,938.58 DKK.